JSON Manipulation in Polars

Python
Polars
JSON
Author

Bridger Norman

Published

October 10, 2024

The other day I was tasked with manipulating a JSON formatted data column in a csv. I need to access some information that was deeply embedding into the daunting JSON format. I turned to my favorite python wrangling tool polars and began learning. I have recorded what I learned here for future reference and to help those you may be tasked with a similar issue.

decode the json column with str.json_decode() this converts the text to a struct which in polars is practically json

next if the list has {} around it use .unnest() to make the first layer of the json the columns note: the data type will look something like this struct[22]

next if the list has [] around it use .explode() to remove the items out of the list. note: the data type will look something like this list[struct[11]]

use the polars .select() function in between the .unnest() functions so that the data knows how to get wider. This works exactly like a SQL select